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Abstract 

A  general  framework  for  solving  image  inverse  problems  is  introduced  in  this  paper.  The  approach  is 
based  on  Gaussian  mixture  models,  estimated  via  a  computationally  efficient  MAP-EM  algorithm.  A  dual 
mathematical  interpretation  of  the  proposed  framework  with  structured  sparse  estimation  is  described,  which 
shows  that  the  resulting  piecewise  linear  estimate  stabilizes  the  estimation  when  compared  to  traditional 
sparse  inverse  problem  techniques.  This  interpretation  also  suggests  an  effective  dictionary  motivated  initial¬ 
ization  for  the  MAP-EM  algorithm.  We  demonstrate  that  in  a  number  of  image  inverse  problems,  including 
inpainting,  zooming,  and  deblurring,  the  same  algorithm  produces  either  equal,  often  significantly  better,  or 
very  small  margin  worse  results  than  the  best  published  ones,  at  a  lower  computational  cost. 


L  Introduction 

Image  restoration  often  requires  to  solve  an  inverse  problem.  It  amounts  to  estimate  an  image  f  from  a 
measurement 

y  =  Uf+ w, 

obtained  through  a  non-invertible  linear  degradation  operator  U,  and  contaminated  by  an  additive  noise 
w.  Typical  degradation  operators  include  masking,  subsampling  in  a  uniform  grid  and  convolution,  the 
corresponding  inverse  problems  often  named  inpainting  or  interpolation,  zooming  and  deblurring.  Estimating 
f  requires  some  prior  information  on  the  image,  or  equivalently  image  models.  Finding  good  image  models 
is  therefore  at  the  heart  of  image  estimation. 

Mixture  models  are  often  used  as  image  priors  since  they  enjoy  the  flexibility  of  signal  description  by  as- 
sunfing  that  the  signals  are  generated  by  a  mixture  of  probability  distributions  Gaussian  mixture  models 
(GMM)  have  been  shown  to  provide  powerful  tools  for  data  classification  and  segmentation  applications  (see 
for  example  l[T4ll.  |[30ll.  |[54ll.  Il5^).  however,  they  have  not  yet  been  shown  to  generate  state-of-the-art  in  a 
general  class  of  inverse  problems.  Ghahramani  and  Jordan  have  applied  GMM  for  learning  from  incomplete 
data,  i.e.,  images  degraded  by  a  masking  operator,  and  have  shown  good  classification  results,  however,  it 
does  not  lead  to  state-of-the-art  inpainting  OTIl.  Portilla  et  al.  have  shown  image  denoising  impressive  results 
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by  assuming  Gaussian  scale  mixture  models  (deviating  from  GMM  by  assuming  different  scale  factors  in 
the  mixture  of  Gaussians)  on  wavelet  representations  ll55ll.  and  have  recently  extended  its  applications  on 
image  deblurring  |[32l.  Recently,  Zhou  et  al.  have  developed  an  nonparametric  Bayesian  approach  using 
more  elaborated  models,  such  as  beta  and  Dirichlet  processes,  which  leads  to  excellent  results  in  denoising 
and  inpainting  iTTTIl. 

The  now  popular  sparse  signal  models,  on  the  other  hand,  assume  that  the  signals  can  be  accurately 
represented  with  a  few  coefficients  selecting  atoms  in  some  dictionary  B31l.  Recently,  very  impressive  image 
restoration  results  have  been  obtained  with  local  patch-based  sparse  representations  calculated  with  dictio¬ 
naries  learned  from  natural  images  lO,  |[2^.  ll43l.  Relative  to  pre-fixed  dictionaries  such  as  wavelets  14511. 
curvelets  im,  and  bandlets  l46l.  learned  dictionaries  enjoy  the  advantage  of  being  better  adapted  to  the 
images,  thereby  enhancing  the  sparsity.  However,  dictionary  learning  is  a  large-scale  and  highly  non-convex 
problem.  It  requires  high  computational  complexity,  and  its  mathematical  behavior  is  not  yet  well  understood. 
In  the  dictionaries  aforementioned,  the  actual  sparse  image  representation  is  calculated  with  relatively 
expensive  non-linear  estimations,  such  as  /i  or  matching  pursuits  IT^.  l22ll.  l4^.  More  importantly,  as 
will  be  reviewed  in  Section  IIII-Ai  with  a  full  degree  of  freedom  in  selecting  the  approximation  space  (atoms 
of  the  dictionary),  non-linear  sparse  inverse  problem  estimation  may  be  unstable  and  imprecise  due  to  the 
coherence  of  the  dictionary  HTIl. 

Structured  sparse  image  representation  models  further  regularize  the  sparse  estimation  by  assuming  de¬ 
pendency  on  the  selection  of  the  active  atoms.  One  simultaneously  selects  blocks  of  approximation  atoms, 
thereby  reducing  the  number  of  possible  approximation  spaces  |[4l,  |[25ll.  l[26l.  13511.  If36ll.  15^.  These 
structured  approximations  have  been  shown  to  improve  the  signal  estimation  in  a  compressive  sensing  context 
for  a  random  operator  U.  However,  for  more  unstable  inverse  problems  such  as  zooming  or  deblurring, 
this  regularization  by  itself  is  not  sufficient  to  reach  state-of-the-art  results.  Recently  some  good  image 
zooming  results  have  been  obtained  with  structured  sparsity  based  on  directional  block  structures  in  wavelet 
representations  m.  However,  this  directional  regularization  is  not  general  enough  to  be  extended  to  solve 
other  inverse  problems. 

This  work  shows  that  the  Gaussian  mixture  models  (GMM),  estimated  via  an  MAP-EM  (maximum  a 
posteriori  expectation-maximization)  algorithm,  lead  to  results  in  the  same  ballpark  as  the  state-of-the- 
art  in  a  number  of  imaging  inverse  problems,  at  a  lower  computational  cost.  The  MAP-EM  algorithm  is 
described  in  Section  [III  After  briefly  reviewing  sparse  inverse  problem  estimation  approaches,  a  mathematical 
equivalence  between  the  proposed  piecewise  linear  estimation  (PEE)  from  GMM/MAP-EM  and  structured 
sparse  estimation  is  shown  in  Section  [Till  This  connection  shows  that  PEE  stabilizes  the  sparse  estimation 
with  a  structured  learned  overcomplete  dictionary  composed  of  a  union  of  PCA  (Principal  Component 
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Analysis)  bases,  and  with  collaborative  prior  information  incorporated  in  the  eigenvalues,  that  privileges  in 
the  estimation  the  atoms  that  are  more  likely  to  be  important.  This  interpretation  suggests  also  an  effective 
dictionary  motivated  initialization  for  the  MAP-EM  algorithm.  In  Section  [IV]  we  support  the  importance  of 
different  components  of  the  proposed  PLE  via  some  initial  experiments.  Applications  of  the  proposed  PEE 
in  image  inpainting,  zooming,  and  deblurring  are  presented  in  sections  |Vl  |Vll  and  IVIII  respectively,  and  are 
compared  with  previous  state-of-the-art  methods.  Conclusions  are  drawn  in  Section  IVIIIl 


II.  Piecewise  Linear  Estimation 


This  section  describes  the  Gaussian  mixture  models  (GMM)  and  the  MAP-EM  algorithm,  which  lead  to 
the  proposed  piecewise  linear  estimation  (PLE). 

A.  Gaussian  Mixture  Model 

Natural  images  include  rich  and  non- stationary  content,  whereas  when  restricted  to  local  windows,  image 
structures  appear  to  be  simpler  and  are  therefore  easier  to  model.  Eollowing  some  previous  works  El,  ifTOl. 
El,  an  image  is  decomposed  into  overlapping  y/N  x  ^/N  local  patches 


y/  =  U/f/  +  w/, 


(1) 


where  U/  is  the  degradation  operator  restricted  to  the  patch  /,  y/,  f/  and  W/  are  respectively  the  degraded, 
original  image  patches  and  the  noise  restricted  to  the  patch,  with  !</</,/  being  the  total  number  of 
patches.  Treated  as  a  signal,  each  of  the  patches  is  estimated,  and  their  corresponding  estimates  are  finally 
combined  and  averaged,  leading  to  the  estimate  of  the  image. 

GMM  describes  local  image  patches  with  a  mixture  of  Gaussian  distributions.  Assume  there  exist  K 
Gaussian  distributions  {lJik’>'^k)}i<k<K  parametrized  by  their  means  jik  and  covariances  JLk-  Each  image 
patch  f/  is  independently  drawn  from  one  of  these  Gaussians  with  an  unknown  index  k,  whose  probability 
density  function  is 


(2) 


Estimating  from  {y/}i</</  can  then  be  casted  into  the  following  problems: 

•  Estimate  the  Gaussian  parameters  {{lik^'^k)}i<k<K,  from  the  degraded  data  {y/}i<K/. 

•  Identify  the  Gaussian  distribution  ki  that  generates  the  patch  /,  VI  <  /  <  /. 

•  Estimate  f/  from  its  corresponding  Gaussian  distribution  (/r/..,Ey^.),  VI  <  /  <  /. 

These  problems  are  overall  non-convex.  The  next  section  will  present  a  maximum  a  posteriori  expectation- 
maximization  (MAP-EM)  algorithm  that  calculates  a  local-minimum  solution  |[3|. 
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B.  MAP-EM  Algorithm 

Following  an  initialization,  addressed  in  Section  Illl-Ci  the  MAP-EM  algorithm  is  an  iterative  procedure 
that  alternates  between  two  steps: 

•  In  the  E-step,  assuming  that  the  estimates  of  the  Gaussian  parameters  {{fikjEk)}i<k<K  are  known 

(following  the  previous  M-step),  for  each  patch  one  calculates  the  maximum  a  posteriori  (MAP) 
estimates  with  all  the  Gaussian  models,  and  selects  the  best  Gaussian  model  ki  to  obtain  the  estimate 
of  the  patch  %  =  . 

•  In  the  M-step,  assuming  that  the  Gaussian  model  selection  ki  and  the  signal  estimate  f/,  V/,  are  known 
(following  the  previous  E-step),  one  estimates  (updates)  the  Gaussian  models  {{jlk^^k)}i<k<K- 

1 )  E-step:  Signal  Estimation  and  Model  Selection:  In  the  E-step,  the  estimates  of  the  Gaussian  parameters 
{{ilkj^k)}i<k<K  are  assumed  to  be  known.  To  simplify  the  notation,  we  assume  without  loss  of  generality 
that  the  Gaussians  have  zero  means  Pk  =  0,  as  one  can  always  center  the  image  patches  with  respect  to  the 
means. 

Eor  each  image  patch  /,  the  signal  estimation  and  model  selection  is  calculated  to  maximize  the  log 
a-posteriori  probability  log;?(f/|y/,Ey^): 

{fi,ki)  =  argmaxlog/>(f,|yi,i;jt)  =  argmax(log/>(y,|f,,i;;i)+log/>(f,|i;jt)) 

f,k  f,k 

=  argrmn{\\\Jifi-yif  +  <jHJtii^(i  +  G^log\tk\),  (3) 

where  the  second  equality  follows  the  Bayes  rule  and  the  third  one  is  derived  with  the  assumption  that 
w/  ~  with  Id  the  identity  matrix,  and  f/  ^  ./F(0,Ey^). 

The  maximization  is  first  calculated  over  f/  and  then  over  k.  Given  a  Gaussian  signal  model  f/  ~  ^(0,Ey^), 
it  is  well  known  that  the  MAP  estimate 

ff  =  argnun  (||U,f,-  - y,-f  +  o^fj t^-^)  (4) 

minimizes  the  risk  £'[||f^  —  f/|p]  ll45]l.  One  can  verify  that  the  solution  to  dH)  can  be  calculated  with  a  linear 
filtering 

ff  =  W,,y.-,  (5) 

where 

W,,.  =  (UfU,-  +  cT%-i)-iuf  (6) 

is  a  Wiener  filter  matrix.  Since  UfU/  is  semi-positive  definite,  UfU/ +  is  positive  definite  and  its 

inverse  is  well  defined,  if  is  full  rank. 

The  best  Gaussian  model  ki  that  generates  the  maximum  MAP  probability  among  all  the  models  is  then 
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selected  with  the  estimated 

ki  =  argirun  (|||U;ff  -  y||^  +  +  a^log \tk\y  (7) 

The  signal  estimate  is  obtained  by  plugging  in  the  best  model  ki  in  the  MAP  estimate  dH) 

=  (8) 


The  whole  E-step  is  basically  calculated  with  a  set  of  linear  filters.  For  typical  applications  such  as 
zooming  and  deblurring  where  the  degradation  operators  U/  are  translation-invariant  and  do  not  depend  on 
the  patch  index  /,  i.e.,  U/  =  U,  the  Wiener  filter  matrices  dS])  can  be  precomputed  for  the  K 

Gaussian  distributions.  Calculating  dS])  thus  requires  only  2N^  floating-point  operations  (flops),  where  N  is 
the  image  patch  size.  For  a  translation-variant  degradation  U/,  random  masking  for  example,  Wy^.  needs  to  be 
calculated  at  each  position  where  U/  changes.  Since  UfU/  +  (7^E^^  is  positive  definite,  the  matrix  inversion 
can  be  implemented  with  N^/3  +  2N^  ^N^/3  flops  through  a  Cholesky  factorization  |[9l.  All  this  makes  the 
E-step  computationally  efficient. 

2)  M-step:  Model  Estimation:  In  the  M-step,  the  Gaussian  model  selection  ki  and  the  signal  estimate  f/ 
of  all  the  patches  are  assumed  to  be  known.  Let  be  the  ensemble  of  the  patch  indices  i  that  are  assigned 
to  the  ^-th  Gaussian  model,  i.e.,  and  let  \^k\  be  its  cardinality.  The  parameters  of  each 

Gaussian  model  are  estimated  with  the  maximum  likelihood  estimate  using  all  the  patches  assigned  to  that 
Gaussian  cluster, 

=  argmaxlogp{{fi}i^^i^\llk,'^k)-  (9) 

With  the  Gaussian  model  dS  ,  one  can  easily  verify  that  the  resulting  estimate  is  the  empirical  estimate 


A*  =  ^  L  t  and  tk 


(10) 


The  empirical  covariance  estimate  may  be  improved  through  regularization  when  there  is  lack  of  data  lISTll 
(for  typical  patch  size  8  x  8,  the  dimension  of  the  covariance  matrix  Ek  is  64  x  64,  while  the  \^k\  is  typically 
in  the  order  of  a  few  hundred).  A  simple  and  standard  eigenvalue-based  regularization  is  used  here,  ^ 
Ek  +  eld,  where  8  is  a  small  constant.  The  regularization  also  guarantees  that  the  estimate  of  the  covariance 
matrix  is  full-rank,  so  that  the  Wiener  filter  dS)  is  always  well  defined.  This  is  important  for  the  Gaussian 
model  selection  dZl)  as  well,  since  if  JLk  is  not  full  rank,  then  log|Ey^|  ^  — oo,  biasing  the  model  selection. 
The  computational  complexity  of  the  M-step  is  negligible  with  respect  to  the  E-step. 

As  the  MAP-EM  algorithm  described  above  iterates,  the  MAP  probability  of  the  observed  signals 
p{{ii}i<i<i\{yi}i<i<h{h^^k}i<k<K)  always  increases.  This  can  be  observed  by  interpreting  the  E-  and  M- 
steps  as  a  coordinate  descent  optimization  1341.  In  the  experiments,  the  convergence  of  the  patch  clustering 


June  15,  2010 


DRAFT 


6 


and  resulting  PSNR  is  always  observed. 

III.  PLE  AND  Structured  Sparse  Estimation 

The  MAP-EM  algorithm  described  above  requires  an  initialization.  A  good  initialization  is  highly  important 
for  iterative  algorithms  that  try  to  solve  non-convex  problems,  and  remains  an  active  research  topic  Q,  12^. 
This  section  describes  a  dual  structured  sparse  interpretation  of  GMM  and  MAP-EM,  which  suggests  an 
effective  dictionary  motivated  initialization  for  the  MAP-EM  algorithm.  Moreover,  it  shows  that  the  resulting 
piecewise  linear  estimate  stabilizes  traditional  sparse  inverse  problem  estimation. 

The  sparse  inverse  problem  estimation  approaches  will  be  first  reviewed.  After  describing  the  connection 
between  MAP-EM  and  structured  sparsity  via  estimation  in  PCA  bases,  an  intuitive  and  effective  initialization 
will  be  presented. 

A.  Sparse  Inverse  Problem  Estimation 

Traditional  sparse  super-resolution  estimation  in  dictionaries  provides  effective  non-parametric  approaches 
to  inverse  problems,  although  the  coherence  of  the  dictionary  and  their  large  degree  of  freedom  may  become 
sources  of  instability  and  errors.  These  algorithms  are  briefly  reviewed  in  this  section.  “Super-resolution”  is 
loosely  used  here  as  these  approaches  try  to  recover  information  that  is  lost  after  the  degradation. 

A  signal  f  G  is  estimated  by  taking  advantage  of  prior  information  which  specifies  a  dictionary  D  G 
RAx|r|,  having  |r|  columns  corresponding  to  atoms  {0m}mGr,  where  f  has  a  sparse  approximation.  This 
dictionary  may  be  a  basis  or  some  redundant  frame,  with  |r|  >  N.  Sparsity  means  that  f  is  well  approximated 
by  its  orthogonal  projection  fA  over  a  subspace  Va  generated  by  a  small  number  |A|  <C  |r|  of  column  vectors 
{0m}mGA  C)f  D. 

f  =  fA  +  ^A  =  D(a*  1a)  +  £a5  (11) 

where  a  G  is  the  transform  coefficient  vector,  a  •  1a  selects  the  coefficients  in  A  and  sets  the  others  to 
zero,  D(a*lA)  multiplies  the  matrix  D  with  the  vector  a  - 1a,  and  ||ea|P  ||f|P  is  a  small  approximation 
error. 

Sparse  inversion  algorithms  try  to  estimate  from  the  degraded  signal  y  =  Uf  +  w  the  support  A  and  the 
coefficients  a  in  A  that  specify  the  projection  of  f  in  the  approximation  space  Va-  It  results  from  ([TT])  that 

y  =  UD(a-lA)  +  E',  with  8'  =  U8  +  w.  (12) 

This  means  that  y  is  well  approximated  by  the  same  sparse  set  A  of  atoms  and  the  same  coefficients  a  in 
the  transformed  dictionary  UD,  whose  columns  are  the  transformed  vectors  {V^m}mer- 

Since  U  is  not  an  invertible  operator,  the  transformed  dictionary  UD  is  redundant,  with  column  vectors 
which  are  linearly  dependent.  It  results  that  y  has  an  infinite  number  of  possible  decompositions  in  UD. 
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A  sparse  approximation  y  =  UDa  of  y  can  be  calculated  with  a  basis  pursuit  algorithm  which  minimizes  a 
Lagrangian  penalized  by  a  sparse  /i  norm  fl^.  ll^ 

a  =  argmin||UDa  — y|p  +  A  ||a||i,  (13) 

a 

or  with  faster  greedy  matching  pursuit  algorithms  The  resulting  sparse  estimation  of  f  is 

f=Da.  (14) 

As  we  explain  next,  this  simple  approach  is  not  straightforward  and  often  not  as  effective  as  it  seems.  The 
Restrictive  Isometry  Property  of  Candes  and  Tao  ifT^  and  Donoho  ll^  is  a  strong  sufficient  condition  which 
guarantees  the  correctness  of  the  penalized  /i  estimation.  This  restrictive  isometry  property  is  valid  for  certain 
classes  of  operators  U,  but  not  for  important  structured  operators  such  as  subsampling  on  a  uniform  grid  or 
convolution.  For  structured  operators,  the  precision  and  stability  of  this  sparse  inverse  estimation  depends 
upon  the  “geometry”  of  the  approximation  support  A  of  f,  which  is  not  well  understood  mathematically, 
despite  some  sufficient  exact  recovery  conditions  proved  for  example  by  Tropp  |l62|,  and  many  others  (mostly 
related  to  the  coherence  of  the  equivalent  dictionary).  Nevertheless,  some  necessary  qualitative  conditions 
for  a  precise  and  stable  sparse  super-resolution  estimate  (fT4b  can  be  deduced  as  follows  Il45]l.  Il47ll: 

•  Sparsity.  D  provides  a  sparse  representation  for  f. 

•  Recoverability.  The  atoms  have  non  negligible  norms  \\IJ (j)m\\'^  0.  If  the  degradation  operator  U 

applied  to  leaves  no  “trace,”  the  corresponding  coefficient  a[m]  can  not  be  recovered  from  y  with  (fT3]). 
We  will  see  in  the  next  subsection  that  this  recoverability  property  of  transformed  relevant  atoms  having 
sufficient  energy  is  critical  for  the  GMM/MAP-EM  introduced  in  the  previous  section  as  well. 

•  Stability.  The  transformed  dictionary  UD  is  incoherent  enough.  Sparse  inverse  problem  estimation  may 
be  unstable  if  some  columns  {lJ(j)m}meT  in  UD  are  too  similar.  To  see  this,  let  us  imagine  a  toy  example, 
where  a  constant- value  atom  and  a  highly  oscillatory  atom  (with  values  —1,1,— 1,1,...),  after  a  x2 
subsampling,  become  identical.  The  sparse  estimation  ([13])  can  not  distinguish  between  them,  which 
results  in  an  unstable  inverse  problem  estimate  ([T4b.  The  coherence  of  UD  depends  on  D  as  well  as  on 
the  operator  U.  Regular  operators  U  such  as  subsampling  on  a  uniform  grid  and  convolution,  usually 
lead  to  a  coherent  UD,  which  makes  accurate  inverse  problem  estimation  difficult. 

Several  authors  have  applied  this  sparse  super-resolution  framework  ([T3I)  and  (O  for  image  inverse 
problems.  Sparse  estimation  in  dictionaries  of  curvelet  frames  and  DCT  have  been  applied  successfully 
to  image  inpainting  ||24l,  ll27ll.  |[33]|.  However,  for  uniform  grid  interpolations.  Section  [Vl|  shows  that  the 
resulting  interpolation  estimations  are  not  as  precise  as  simple  linear  bicubic  interpolations.  A  contourlet 
zooming  algorithm  iBTl  can  provide  a  slightly  better  PSNR  than  a  bicubic  interpolation,  but  the  results  are 
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considerably  below  the  state-of-the-art.  Learned  dictionaries  of  image  patches  have  generated  good  inpainting 
results  ll43ll.  IItTII.  In  some  recent  works  sparse  super-resolution  algorithms  with  learned  dictionary  have  been 
studied  for  zooming  and  deblurring  HU.  |[65ll.  As  shown  in  sections  IVTI  and  IVlIl  although  they  sometimes 
produce  good  visual  quality,  they  often  generate  artifacts  and  the  resulting  PSNRs  are  not  as  good  as  more 
standard  methods. 

Another  source  of  instability  of  these  algorithms  comes  from  their  full  degree  of  freedom.  The  non-linear 
approximation  space  Va  is  estimated  by  selecting  the  approximation  support  A,  with  basically  no  constraint. 
A  selection  of  |A|  atoms  from  a  dictionary  of  size  |r|  thus  corresponds  to  a  choice  of  an  approximation  space 
among  (j^j)  possible  subspaces.  In  a  local  patch-based  sparse  estimation  with  8x8  patch  size,  typical  values 
of  |r|  =  256  and  |A|  =  8  lead  to  a  huge  degree  of  freedom  ~  10^^,  further  stressing  the  inaccuracy  of 
estimating  a  from  an  UD. 

These  issues  are  addressed  with  the  proposed  PLE  framework  and  its  mathematical  connection  with 
structured  sparse  models  described  next. 


B.  Structured  Sparse  Estimation  in  PCA  bases 

The  PCA  bases  bridge  the  GMM/MAP-EM  framework  presented  in  Section  [II|  with  the  sparse  estimation 
described  above.  Eor  signals  {f/}  following  a  statistical  distribution,  a  PCA  basis  is  defined  as  the  matrix 
that  diagonalizes  the  data  covariance  matrix  ], 

=  (15) 

where  B/.  is  the  PCA  basis  and  Sk  =  diag(Af , . . . ,  is  a  diagonal  matrix,  whose  diagonal  elements  Af  > 
Al  >  . . .  >  A^  are  the  sorted  eigenvalues.  It  can  be  shown  that  the  PCA  basis  is  orthonormal,  i.e.,  =  Id, 

each  of  its  columns  0^,  I  <m<N,  being  an  atom  that  represents  one  principal  direction.  The  eigenvalues 
are  non-negative,  A^  >  0,  and  measure  the  energy  of  the  signals  {f/}  in  each  of  the  principal  directions  H3]l. 

Transforming  f/  from  the  canonical  basis  to  the  PCA  basis  =  B^f/,  one  can  verify  that  the  MAP 
estimate  dJ])-®  can  be  equivalently  calculated  as 

^  =  (16) 


where,  following  simple  algebra  and  calculus,  the  MAP  estimate  of  the  PCA  coefficients  is  obtained  by 


af  =  arg  imn  [  1 1  U/B^a,-  -  y,  f  ^ 


A* 

m=l 


(17) 


Comparing  (fTTl)  with  (fTSl).  the  MAP-EM  estimation  can  thus  be  interpreted  as  a  structured  sparse  es¬ 
timation.  As  illustrated  in  Eigure  [B  the  proposed  dictionary  has  the  advantage  of  the  traditional  learned 
overcomplete  dictionaries  being  overcomplete,  and  adapted  to  the  image  under  test  thanks  to  the  Gaussian 
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Fig.  1.  Left:  Traditional  overcomplete  dictionary.  Each  column  represents  an  atom  in  the  dictionary.  Non-linear 
estimation  has  the  full  degree  of  freedom  to  select  any  combination  of  atoms  (marked  by  the  columns  in  red).  Right: 
The  underlying  structured  sparse  piecewise  linear  dictionary  of  the  proposed  approach.  The  dictionary  is  composed  of 
a  family  of  PC  A  bases  whose  atoms  are  pre-ordered  by  their  associated  eigenvalues.  For  each  image  patch,  an  optimal 
linear  estimator  is  calculated  in  each  PCA  basis  and  the  best  linear  estimate  among  the  bases  is  selected  (marked  by 
the  basis  in  red). 


model  estimation  in  the  M-step  (which  is  equivalent  to  updating  the  PCAs),  but  the  resulting  piecewise 
linear  estimator  (PLE)  is  more  structured  than  the  traditional  nonlinear  sparse  estimation.  PLE  is  calculated 
with  a  linear  estimation  in  each  basis  and  a  non-linear  best  basis  selection: 

•  Nonlinear  block  sparsity.  The  dictionary  is  composed  of  a  union  of  K  PCA  bases.  To  represent  an 
image  patch,  the  non-linear  model  selection  ([3])  in  the  E-step  restricts  the  estimation  to  only  one  basis 
{N  atoms  out  of  KN  selected  in  group),  and  has  a  degree  of  freedom  equal  to  K,  sharply  reduced  from 
that  in  the  traditional  sparse  estimation  which  has  the  full  degree  of  freedom  in  atom  selection. 

•  Linear  collaborative  filtering.  Inside  each  PCA  basis,  the  atoms  are  pre-ordered  by  their  associated 

eigenvalues  (which  decay  very  fast  as  we  will  later  see,  leading  to  sparsity  inside  the  block  as  well).  In 
contrast  to  the  non-linear  sparse  /i  estimation  ([T3]).  the  MAP  estimate  (fTTl)  implements  the  regularization 
with  the  h  norm  of  the  coefficients  weighted  by  the  eigenvalues  and  is  calculated  with  a 

linear  filtering  ([5])  ([6l).  The  eigenvalues  are  computed  from  all  the  signals  {f/}  in  the  same  Gaussian 
distribution  class.  The  resulting  estimation  therefore  implements  a  collaborative  filtering  which  incorpo¬ 
rates  the  information  from  all  the  signals  in  the  same  cluster  Ull.  The  weighting  scheme  privileges  the 
coefficients  a/[m]  corresponding  to  the  principal  directions  with  large  eigenvalues  where  the  energy 
is  likely  to  be  high,  and  penalizes  the  others.  Eor  the  ill-posed  inverse  problems,  the  collaborative  prior 
information  incorporated  in  the  eigenvalues  {A^}i<^<a^  further  stabilizes  the  estimate.  (Note  that  this 
collaborative  weighting  is  fundamentally  different  than  the  standard  one  used  in  iterative  weighted  h 
approaches  to  sparse  coding  |[20l.) 

As  described  in  Section  HD  the  complexity  of  the  MAP-EM  algorithm  is  dominated  by  the  E-step.  Eor  an 
image  patch  size  of  VTv  x  \/N  (typical  value  8  x  8),  it  costs  2KN'^  flops  for  translation-invariant  degradation 
operators  such  as  uniform  subsampling  and  convolution,  and  KN^/3  flops  for  translation- variant  operators 
such  as  random  masking,  where  K  is  the  number  of  PCA  bases.  The  overall  complexity  is  therefore  tightly 
upper  bounded  by  ff{2LKN^)  or  ff{LKN^/3),  where  L  is  the  number  of  iterations.  As  will  be  shown  in 
Section  [IVl  the  algorithm  converges  fast  for  image  inverse  problems,  typically  in  L  =  3  to  5  iterations.  On 
the  other  hand,  the  complexity  of  the  /i  minimization  with  the  same  dictionary  is  with  typically 

a  large  factor  in  front  as  the  /i  converges  slowly  in  practice.  The  MAP-EM  algorithm  is  thus  typically  one 
or  two  orders  of  magnitude  faster  than  the  sparse  estimation. 
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To  conclude,  let  as  come  back  to  the  recoverability  property  mentioned  in  the  previous  section.  We  see 
from  (fTTl)  that  if  an  eigenvector  of  the  covariance  matrix  is  killed  by  the  operator  U/,  then  its  contribution  to 
the  recovery  of  y/  is  virtually  null,  while  it  pays  a  price  proportional  to  the  corresponding  eigenvalue.  Then, 
it  will  not  be  used  in  the  optimization  (fTTl).  and  thereby  in  the  reconstruction  of  the  signal  following  (fT^. 
This  means  that  the  wrong  model  might  be  selected  and  an  improper  reconstruction  obtained.  This  further 
stresses  the  importance  of  a  correct  design  of  dictionary  elements,  which  from  the  description  just  presented, 
it  is  equivalent  to  the  correct  design  of  the  covariance  matrix,  including  the  initialization,  which  is  described 
next. 

C.  Initialization  of  MAP -EM 

The  PCA  formulation  just  described  not  only  reveals  the  connection  between  the  MAP-EM  estimator  and 
structured  sparse  modeling,  but  it  is  crucial  for  understanding  how  to  initialize  the  Gaussian  models  as  well. 

1 )  Sparsity:  As  explained  in  Section  IIII-Ai  for  the  sparse  inverse  problem  estimation  model  to  have  the 
super-resolution  ability,  the  first  requirement  on  the  dictionary  is  to  be  able  to  provide  sparse  representations 
of  the  image.  It  has  been  shown  that  capturing  image  directional  regularity  is  highly  important  for  sparse 
representations  O,  lITTIl.  Hbll.  In  dictionary  learning,  for  example,  most  prominent  atoms  look  like  local  edges 
good  at  representing  contours,  as  illustrated  in  Figure  [2l-(a).  Therefore  the  initial  PCAs  in  our  framework, 
which  following  (fTSl)  will  lead  to  the  initial  Gaussians,  are  designed  to  capture  image  directional  regularity. 


(a)  (b)  (c)  (d) 


Fig.  2.  (a)  Some  typical  dictionary  atoms  learned  from  the  image  Lena  (Figure  [3]-(a))  with  K-SVD  O.  (b)-(d)  A 

numerical  procedure  to  obtain  the  initial  directional  PCAs.  (b)  A  synthetic  edge  image.  Patches  (8x8)  that  touch  the 
edge  are  used  to  calculate  an  initial  PCA  basis,  (c)  The  first  8  atoms  in  the  PCA  basis  with  the  largest  eigenvalues, 
(d)  Typical  eigenvalues. 

The  initial  directional  PCA  bases  are  calculated  following  a  simple  numerical  procedure.  Directions  from 
0  to  ;r  are  uniformly  sampled  to  K  angles,  and  one  PCA  basis  is  calculated  per  angle.  The  calculation 
of  the  PCA  at  an  angle  9  uses  a  synthetic  blank-and-white  edge  image  following  the  same  direction,  as 
illustrated  in  Figure  [2l-(b).  Local  patches  that  touch  the  contour  are  collected  and  are  used  to  calculate  the 
PCA  basis  (following  (fTOl)  and  (fTSl)).  The  first  atom,  which  is  almost  DC,  is  replaced  by  DC,  and  a  Gram- 
Schmidt  orthogonalization  is  calculated  on  the  other  atoms  to  ensure  the  orthogonality  of  the  basis.  The 
patches  contain  edges  that  are  translation-invariant.  As  the  covariance  of  a  stationary  process  is  diagonalized 
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by  the  Fourier  basis,  unsurprisingly,  the  resulting  PCA  basis  has  first  few  important  atoms  similar  to  the 
cosines  atoms  oscillating  in  the  direction  0  from  low-frequency  to  high-frequency,  as  shown  in  Figure  [2l-(c). 
Comparing  with  the  Fourier  vectors,  these  PCAs  enjoy  the  advantage  of  being  free  of  the  periodic  boundary 
issue,  so  that  they  can  provide  sparse  representations  for  local  image  patches.  The  eigenvalues  of  all  the  bases 
are  initiated  with  the  same  ones  obtained  from  the  synthetic  contour  image,  that  have  fast  decay.  Figure [2|-(d). 
These,  following  (fTSl).  complete  the  covariance  initialization.  The  Gaussian  means  are  initialized  with  zeros. 

It  is  worth  noting  that  this  directional  PCA  basis  not  only  provides  sparse  representations  for  contours 
and  edges,  but  it  captures  well  textures  of  the  same  directionality  as  well.  Indeed,  in  a  space  of  dimension 
N  corresponding  to  patches  of  size  y/N  x  ^/N,  the  first  about  \/A  atoms  illustrated  in  Figure  [2l-(c)  absorb 
most  of  the  energy  in  local  patterns  following  the  same  direction  in  real  images,  as  indicated  by  the  fast 
decay  of  the  eigenvalues  (very  similar  to  Figure  [2l-(d)). 

A  typical  patch  size  is  ^/N  x  ^/N  =  8  x  8,  as  selected  in  previous  works  lO,  ES.  The  number  of  directions 
in  a  local  patch  is  limited  due  to  the  pixelization.  The  DCT  basis  is  also  included  in  competition  with  the 
directional  bases  to  capture  isotropic  image  patterns.  Our  experiments  have  shown  that  in  image  inverse 
problems,  there  is  a  significant  average  gain  in  PSNR  when  K  grows  from  0  to  3  (when  ^  =  0,  the  dictionary 
is  initialized  with  only  a  DCT  basis  and  all  the  patches  are  assigned  to  the  same  cluster),  which  shows  that 
one  Gaussian  model,  or  equivalently  a  single  linear  estimator,  is  not  enough  to  accurately  describe  the 
image.  When  K  increases,  the  gain  reduces  and  gets  stabilized  at  about  K  =  36.  Compromising  between 
performance  and  complexity,  ^  =  18,  which  corresponds  to  a  10°  angle  sampling  step,  is  selected  in  all  the 
future  experiments. 

Figures  [3l-(a)  and  (b)  illustrates  the  Lena  image  and  the  corresponding  patch  clustering,  i.e.,  the  model 
selection  ku  obtained  for  the  above  initialization,  calculated  with  ([7])  in  the  E-step  described  in  Section  [Til 
The  patches  are  densely  overlapped  and  each  pixel  in  Figure  [3l-(b)  represents  the  model  ki  selected  for  the 
8x8  patch  around  it,  different  colors  encoding  different  values  of  ki,  from  1  to  19  (18  directions  plus  a 
DCT).  One  can  observe,  for  example  on  the  edges  of  the  hat,  that  patches  where  the  image  patterns  follow 
similar  directions  are  clustered  together,  as  expected.  Let  us  note  that  on  the  uniform  regions  such  as  the 
background,  where  there  is  no  directional  preference,  all  the  bases  provide  equally  sparse  representations. 
As  the  log|Ey^|  =  term  in  the  model  selection  (|7])  is  initialized  as  identical  for  all  the  Gaussian 

models,  the  clustering  is  random  is  these  regions.  The  clustering  will  improve  as  the  MAP-EM  progresses. 

2)  Recoverability:  The  oscillatory  atoms  illustrated  in  Eigure  [2l-(c)  are  spread  in  space.  Therefore,  for 
diagonal  operators  in  space  such  as  masking  and  subsampling,  they  satisfy  well  the  recoverability  condition 
||U0^|P  >  0  for  super-resolution  described  in  Section Illl-Ai  However,  as  these  oscillatory  atoms  have  Dirac 
supports  in  Eourier,  for  convolution  operators,  the  recoverability  condition  is  violated.  Eor  convolution 
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Fig.  3.  (a).  Lena  image,  ((b)  to  (d)  are  color  images.)  (b).  Patch  clustering  obtained  with  the  initial  directional  PCAs 

(see  Figure  l2]-(c)).  The  patches  are  densely  overlapped  and  each  pixel  represents  the  model  ki  selected  for  the  8  x  8 
patch  around  it,  different  colors  encoding  different  direction  values  of  ki,  from  1  to  ^  =  19.  (c).  Patch  clustering 
obtained  with  the  initial  position  PCAs  (see  Figure  H]).  Different  colors  encoding  different  position  values  of  ku  from 
1  to  P  =  12.  (d)  and  (e).  Patch  clustering  with  respectively  directional  and  position  PCAs  after  the  2nd  iteration. 

operators  U,  ||U0^|p  >  0  requires  that  the  atoms  have  spread  Fourier  spectrum,  and  therefore  be  localized 
in  space.  Following  a  similar  numerical  scheme  as  described  above,  patches  touching  the  edge  at  a  fixed 
position  are  extracted  from  synthetic  edge  images  with  different  amounts  of  blur.  The  resulting  PCA  basis, 
named  position  PCA  basis  hereafter,  contains  localized  atoms  of  different  polarities  and  at  different  scales, 
following  the  same  direction  0,  as  illustrated  in  Figure  |4]  (which  look  like  wavelets  along  the  appropriate 
direction).  For  each  direction  0,  a  family  of  localized  PCA  bases  {^k,p}i<p<p  are  calculated  at  all  the 
positions  translating  within  the  patch.  The  eigenvalues  are  initialized  with  the  same  fast  decay  ones  as 
illustrated  in  Figure  [2l-(d)  for  all  the  position  PCA  bases.  Each  pixel  in  Figure  [3l-(c)  represents  the  model 
Pi  selected  for  the  8  x  8  patch  around  it,  different  colors  encoding  different  position  values  of  pu  from  1  to 
12.  The  rainbow-like  color  transitions  on  the  edges  show  that  the  position  bases  are  accurately  fitted  to  the 
image  structures. 

Fig.  4.  The  first  8  atoms  in  the  position  PCA  basis  with  the  largest  eigenvalues. 

3)  Wiener  Filtering  Interpretation:  Figure  [5]  illustrates  some  typical  Wiener  filters,  which  are  the  rows 
of  in  (H,  calculated  with  the  initial  PCA  bases  described  above  for  zooming  and  deblurring.  The  filters 
have  intuitive  interpretations,  for  example  directional  interpolator  for  zooming  and  directional  deconvolution 
for  deblurring,  confirming  the  effectiveness  of  the  initialization. 

D.  Additional  comments  on  related  works 

Before  proceeding  with  experimental  results  and  applications,  let  us  further  comment  on  some  related 
works,  in  addition  to  those  already  addressed  in  Section  [D 

The  MAP-EM  algorithm  using  various  probability  distributions  such  as  Gaussian,  Laplacian,  Gamma 
and  Gibbs  have  been  widely  applied  in  medical  image  reconstruction  and  analysis  (see  for  example  ITTOII. 
Boll).  Eollowing  the  Gaussian  mixture  models,  MAP-EM  alternates  between  image  patch  estimation  and 
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Fig.  5.  Some  filters  generated  by  the  MAP  estimator,  (a)  and  (b)  are  for  image  zooming,  where  the  degradation  operator 
U  is  a  2  X  2  subsampling  operator.  Gray-level  from  white  to  black:  values  from  negative  to  positive,  (a)  is  computed 
with  a  Gaussian  distribution  whose  PCA  basis  is  a  DCT  basis,  and  it  implements  an  isotropic  interpolator,  (b)  is 
computed  with  a  Gaussian  distribution  whose  PCA  basis  is  a  directional  PCA  basis  (angle  6  =  30°),  and  it  implements 
a  directional  interpolator,  (c)  and  (d)  are  shown  in  Fourier  and  are  for  image  deblurring,  where  the  degradation  operator 
U  is  a  Gaussian  convolution  operator.  Gray-level  from  white  to  black:  Fourier  modules  from  zero  to  positive,  (c)  is 
computed  with  a  Gaussian  distribution  whose  PCA  basis  is  a  DCT  basis,  and  it  implements  an  isotropic  deblurring 
filter,  (d)  is  computed  with  a  Gaussian  distribution  whose  PCA  basis  is  a  directional  PCA  basis  (angle  6  =  30°,  at  a 
fixed  position),  and  it  implements  a  directional  deblurring  filter. 

clustering,  and  Gaussian  models  estimation.  Clustering-based  estimation  has  been  shown  effective  for  image 
restoration.  To  achieve  accurate  clustering-based  estimation,  an  appropriate  clustering  is  at  the  heart.  In 
a  denoising  setting  where  images  are  noisy  but  not  degraded  by  the  linear  operator  U,  clustering  with 
block  matching,  i.e.,  calculating  Euclidian  distance  between  image  patch  gray-levels  ifTOll.  ifTTll.  iH^.  and 
with  image  segmentation  algorithms  such  as  k-means  on  local  image  features  lITSll.  have  been  shown  to 
improve  the  denoising  results.  For  inverse  problems  where  the  observed  images  are  degraded,  for  example 
images  with  holes  in  an  inpainting  setting,  clustering  becomes  more  difficult.  The  generalized  PCA  |[6^ 
models  and  segments  data  using  an  algebraic  subspace  clustering  technique  based  on  polynomial  fitting  and 
differentiation,  and  while  it  has  been  shown  effective  in  image  segmentation,  it  does  not  reach  state-of-the- 
art  in  image  restoration.  In  the  recent  non-parametric  Bayesian  approach  |EQl,  an  image  patch  clustering  is 
implemented  with  probability  models,  which  improves  the  denoising  and  inpainting  results,  although  still 
under  performing,  in  quality  and  computational  cost,  the  framework  here  introduced.  The  clustering  in  the 
MAP-EM  procedure  enjoys  the  advantage  of  being  completely  consistent  with  the  signal  estimation,  and  in 
consequence  leads  to  state-of-the-art  results  in  a  number  of  imaging  inverse  problem  applications. 

IV.  Initial  Supportive  Experiments 

Before  proceeding  with  detailed  experimental  results  for  a  number  of  applications  of  the  proposed  frame¬ 
work,  this  section  shows  through  some  basic  experiments  the  effectiveness  and  importance  of  the  initialization 
proposed  above,  the  evolution  of  the  representations  as  the  MAP-EM  algorithm  iterates,  as  well  as  the 
improvement  brought  by  the  structure  in  PEE  with  respect  to  traditional  sparse  estimation. 

Following  some  recent  works,  e.g.,  an  image  is  decomposed  into  128  x  128  regions,  each  region 
treated  with  the  MAP-EM  algorithm  separately.  The  idea  is  that  image  contents  are  often  more  coherent 
semi-locally  than  globally,  and  Gaussian  model  estimation  or  dictionary  learning  can  be  slightly  improved 
in  semi-local  regions.  This  also  saves  memory  and  enables  the  processing  to  proceed  as  the  image  is  being 
transmitted.  Parallel  processing  on  image  regions  is  also  possible  when  the  whole  image  is  available.  Regions 
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are  half-overlapped  to  eliminate  the  boundary  effect  between  the  regions,  and  their  estimates  are  averaged 
at  the  end  to  obtain  the  final  estimate. 

A.  Initialization 

Different  initializations  are  compared  in  the  context  of  different  inverse  problems,  inpainting,  zooming 
and  deblurring.  The  reported  experiments  are  performed  on  some  typical  image  regions,  Lena’s  hat  with 
sharp  contours  and  Barbara’s  cloth  rich  in  texture,  as  illustrated  in  Figure  [6l 

Inpainting.  In  the  addressed  case  of  inpainting,  the  image  is  degraded  by  U,  that  is  a  random  masking 
operator  which  randomly  sets  pixel  values  to  zeros.  The  initialization  described  above  is  compared  with  a 
random  initialization,  which  initializes  in  the  E-step  all  the  missing  pixel  value  with  zeros  and  starts  with 
a  random  patch  clustering.  Figure  [6l-(a)  and  (b)  compare  the  PSNRs  obtained  by  the  MAP-EM  algorithm 
with  those  two  initializations.  The  algorithm  with  the  random  initialization  converges  to  a  PSNR  close  to, 
about  0.4  dB  lower  than,  that  with  the  proposed  initialization,  and  the  convergence  takes  much  longer  time 
(about  6  iterations)  than  the  latter  (about  3  iterations). 

It  is  worth  noting  that  on  the  contours  of  Lena’s  hat,  with  the  proposed  initialization  the  resulting  PSNR 
is  stable  from  the  initialization,  which  already  produces  accurate  estimation,  since  the  initial  directional  PCA 
bases  themselves  are  calculated  over  synthetic  contour  images,  as  described  in  Section  IIII-C[ 

Zooming.  In  the  context  of  zooming,  the  degradation  U  is  a  subsampling  operator  on  a  uniform  grid,  much 
structured  than  that  for  inpainting.  The  MAP-EM  algorithm  with  the  random  initialization  completely  fails 
to  work:  It  gets  stuck  in  the  initialization  and  does  not  lead  to  any  changes  on  the  degraded  image.  Instead 
of  initializing  the  missing  pixels  with  zeros,  a  bicubic  initialization  is  tested,  which  initializes  the  missing 
pixels  with  bicubic  interpolation.  Figure  [6l-(c)  shows  that,  as  the  MAP-EM  algorithm  iterates,  it  significantly 
improves  the  PSNR  over  the  bicubic  initialization,  however,  the  PSNR  after  a  slower  convergence  is  still 
about  0.5  dB  lower  than  that  obtained  with  the  proposed  initialization. 

Deblurring.  In  the  deblurring  setting,  the  degradation  U  is  a  convolution  operator,  which  is  very  structured, 
and  the  image  is  further  contaminated  with  a  white  Gaussian  noise.  Four  initializations  are  under  considera¬ 
tion:  the  initialization  with  directional  PCAs  {K  directions  plus  a  DCT  basis),  which  is  exactly  the  same  as 
that  for  inpainting  and  zooming  tasks,  the  proposed  initialization  with  the  position  PCA  bases  for  deblurring 
as  described  in  Section  1111-021  (P  positions  per  each  of  the  K  directions,  all  with  the  same  eigenvalues  as  for 
the  directional  PCAs  initialization),  and  two  random  initializations  with  the  blurred  image  itself  as  the  initial 
estimate  and  a  random  patch  clustering  with,  respectively,  ^+1  and  (^+1)P  clusters.  As  illustrated  in 
Figure  [6l-(d),  the  algorithm  with  the  directional  PCAs  initialization  gets  stuck  in  a  local  minimum  since  the 
second  iteration,  and  converges  to  a  PSNR  1.5  dB  lower  than  that  with  the  initialization  using  the  position 
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PCAs.  Indeed,  since  the  recoverability  condition  for  deblurring,  as  explained  in  Section  IIII-C21  is  violated 
with  just  directional  PCA  bases,  the  resulting  images  remain  still  quite  blurred.  The  random  initialization 
with  (^+  1)P  clusters  results  in  better  results  than  with  ^+1  clusters,  which  is  0.7  dB  worse  than  the 
proposed  initialization  with  position  PCAs. 

These  experiments  confirm  the  importance  of  the  initialization  in  the  MAP-EM  algorithm  to  solve  inverse 
problems.  The  sparse  coding  dual  interpretation  of  GMM/MAP-EM  helps  to  deduce  effective  initializations 
for  different  inverse  problems.  While  for  inpainting  with  random  masking  operators,  trivial  initializations 
slowly  converge  to  a  solution  moderately  worse  than  that  obtained  with  the  proposed  initialization,  for  more 
structured  degradation  operators  such  as  uniform  subsampling  and  convolution,  simple  initializations  either 
completelv  fail  to  work  or  lead  to  significantly  worse  results  than  with  the  proposed  initialization. 


Fig.  6.  PSNR  comparison  of  the  MAP-EM  algorithm  with  different  initializations  on  different  inverse  problems.  The 
horizontal  axis  corresponds  to  the  number  of  iterations,  (a)  and  (b).  Inpainting  with  50%  and  30%  available  data,  on 
Lena’s  hat  and  Barbara’s  cloth.  The  initializations  under  consideration  are  the  random  initialization  and  the  initialization 
with  directional  PCA  bases,  (c)  Zooming,  on  Lena’s  hat.  The  initializations  under  consideration  are  bicubic  initialization 
and  the  initialization  with  directional  PCA  bases.  (Random  initialization  completely  fails  to  work.)  (d)  Deblurring,  on 
Lena’s  hat.  The  initializations  under  consideration  are  the  initialization  with  directional  PCAs  {K  directions  plus  a 
DCT  basis),  the  initialization  with  the  position  PCA  bases  {P  positions  per  each  of  the  K  directions),  and  two  random 
initializations  with  the  blurred  image  itself  as  the  initial  estimate  and  a  random  patch  clustering  with,  respectively, 
K^  \  (rand.  1)  and  {K^\)P  (rand.  2)  clusters.  See  text  for  more  details. 

B.  Evolution  of  Representations 

Eigure  [7] illustrates,  in  an  inpainting  context  on  Barbara’s  cloth,  which  is  rich  in  texture,  the  evolution  of  the 
patch  clustering  as  well  as  that  of  a  typical  PCA  bases  as  the  MAP-EM  algorithm  iterates.  The  clustering  gets 
cleaned  up  as  the  algorithm  iterates.  (See  figures  [3]-(d)  and  (e)  for  another  example.)  Some  high-frequency 
atoms  are  promoted  to  better  capture  the  oscillatory  patterns,  resulting  in  a  significant  PSNR  improvement 
of  more  than  3  dB.  On  contour  images  such  as  Lena’s  hat  illustrated  in  Eigure  [6l  on  the  contrary,  although 
the  patch  clustering  is  cleaned  up  as  the  algorithm  iterates,  the  resulting  local  PSNR  evolves  little  after  the 
initialization,  which  already  produces  accurate  estimation,  since  the  directional  PCA  bases  themselves  are 
calculated  over  synthetic  contour  images,  as  described  in  Section  IIII-C[  The  eigenvalues  have  always  fast 
decay  as  the  iteration  goes  on,  visually  similar  to  the  plot  in  Eigure  [2l-(d).  The  resulting  PSNRs  typically 
converge  in  3  to  5  iterations. 
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(a)  (b)  (c)  (d)  (e) 

Fig.  7.  Evolution  of  the  representations,  (a)  The  original  image  cropped  from  Barbara,  (b)  The  image  masked  with 
30%  available  data,  (c)  and  (d)  are  color  images,  (c)  Bottom:  The  first  few  atoms  of  an  initial  PCA  basis  corresponding 
to  the  texture  on  the  right  of  the  image.  Top:  The  resulting  patch  clustering  after  the  1st  iteration.  Different  colors 
represent  different  clusters,  (d)  Bottom:  The  first  few  atoms  of  the  PCA  basis  updated  after  the  1st  iteration.  Top:  The 
resulting  patch  clustering  after  the  2nd  iteration,  (e)  The  inpainting  estimate  after  the  2nd  iteration  (32.30  dB). 


C.  Estimation  Methods 


(a)  Original  image. 


(b)  Low-resolution  image.  (c)  Global  h:  22.70  dB  (d)  Global  OMP:  28.24  dB 


(e)  Block  h:  26.35  dB 


(f)  Block  OMP:  29.27  dB  (g)  Block  weighted  l\:  35.94  dB 


(h)  Block  weighted  I2’.  36.45  dB 


Fig.  8.  Comparison  of  different  estimation  methods  on  super-resolution  zooming,  (a)  The  original  image  cropped 
from  Lena,  (b)  The  low-resolution  image,  shown  at  the  same  scale  by  pixel  duplication.  From  (c)  to  (h)  are  the 
super-resolution  results  obtained  with  different  estimation  methods.  See  text  for  more  details. 


From  the  sparse  coding  point  of  view,  the  gain  of  introducing  structure  in  sparse  inverse  problem  estimation 
as  described  in  Section  [Till  is  now  shown  through  some  experiments.  An  overcomplete  dictionary  D  composed 
of  a  family  of  PCA  bases  illustrated  in  Figure  [I]-(b),  is  learned  as  described  in  Section  [III  and 

is  then  fed  to  the  following  estimation  schemes,  (i)  Global  h  and  OMP:  the  ensemble  of  D  is  used  as 
an  overcomplete  dictionary,  and  the  zooming  estimation  is  calculated  with  the  sparse  estimate  (fT3])  through, 
respectively,  an  li  minimization  or  an  orthogonal  matching  pursuit  (OMP).  (ii)  Block  li  and  OMP:  the 
sparse  estimate  is  calculated  in  each  PCA  basis  through,  respectively  an  /i  mininfization  and  an  OMP, 
and  the  best  estimate  is  selected  with  a  model  selection  procedure  similar  to  ©,  thereby  reducing  the  degree 
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of  freedom  in  the  estimation  with  respect  to  the  global  h  and  OMR  ll67l.  (hi)  Block  weighted  h:  on  top 
of  the  block  /i,  weights  are  included  for  each  coefficient  amplitude  in  the  regularizer, 

I  a,- [ml 


N 


af  =  argmin  ||U/By^a/-y/f +  CJ^  ^ 


q-K 

m=l  m 


(18) 


with  the  weights  =  (A^)^/^,  where  are  the  eigenvalues  of  the  ^-th  RCA  basis.  The  weighting  scheme 
penalizes  the  atoms  that  are  less  likely  to  be  important,  following  the  spirit  of  the  weighted  I2  deduced  from 
the  MAR  estimate,  (iv)  Block  weighted  I2:  the  proposed  RLE.  Comparing  with  (fT^.  the  difference  is  that 
the  weighted  I2  (fTTl)  takes  the  place  of  the  weighted  /i,  thereby  transforming  the  problem  into  a  stable  and 
computationally  efficient  piecewise  linear  estimation. 

The  comparison  on  a  typical  region  of  Lena  in  the  2x2  image  zooming  context  is  shown  in  Figure  [8l 
The  global  h  and  OMR  produce  some  clear  artifacts  along  the  contours,  which  degrade  the  RSNRs.  The 
block  /i  or  OMR  considerably  improves  the  results  (especially  for  /i).  Comparing  with  the  block  h  or  OMR, 
a  very  significant  improvement  is  achieved  by  adding  the  collaborative  weights  on  top  of  the  block  /i.  The 
proposed  RLE  with  the  block  weighted  I2,  computed  with  linear  filtering,  further  improves  the  estimation 
accuracy  over  the  block  weighted  /i,  with  a  much  lower  computational  cost. 

In  the  following  sections,  RLE  will  be  applied  to  a  number  of  inverse  problems,  including  image  inpainting, 
zooming  and  deblurring.  The  experiments  are  performed  on  some  standard  gray-level  and  color  images, 
illustrated  in  Figure  [H 


Fig.  9.  Images  used  in  the  numerical  experiments.  From  top  to  bottom,  left  to  right.  The  first  eight  are  gray-level  images: 
Lena  (512  x  512),  Barbara  (512  x  512),  Reppers  (512  x  512),  Mandril  (512  x  512),  House  (256  x  256),  Cameraman 
(256  X  256),  Boats(512  x  512),  and  Straws  (640  x  640).  The  rest  are  color  images:  Castle  (481  x  321),  Mushroom 
(481  X  321),  Kangaroo  (321  x  481),  Train  (321  x  481),  Horses  (321  x  481),  Kodak05  (512  x  768),  Kodak20  (512  x  768), 
Girl  (258  x  255),  and  Flower  (171  x  330). 


V.  Inpainting 

In  the  addressed  case  of  inpainting,  the  original  image  f  is  masked  with  a  random  mask,  y  =  Uf,  where  U 
is  a  diagonal  matrix  whose  diagonal  entries  are  randomly  either  1  or  0,  keeping  or  killing  the  corresponding 
pixels.  Note  that  this  can  be  considered  as  a  particular  case  of  compressed  sensing,  or  when  collectively 
considering  all  the  image  patches,  as  matrix  completion  (and  as  here  demonstrated,  in  contrast  with  the 
recent  literature  on  the  subject,  a  single  subspace  is  not  sufficient,  see  also  IItTII). 
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The  experiments  are  performed  on  the  gray-level  images  Lena,  Barbara,  House,  and  Boat,  and  the  color 
images  Castle,  Mushroom,  Train  and  Horses.  Uniform  random  masks  that  retain  80%,  50%,  30%  and  20% 
of  the  pixels  are  used.  The  masked  images  are  then  inpainted  with  the  algorithms  under  consideration. 

For  gray-level  images,  the  image  patch  size  is  \/N  x  \/N  =  8x8  when  the  available  data  is  80%,  50%, 
and  30%.  Larger  patches  of  size  12  x  12  are  used  when  images  are  heavily  masked  with  only  20%  pixels 
available.  For  color  images,  patches  of  size  \/N  x  \/N  x  3  throughout  the  RGB  color  channels  are  used  to 
exploit  the  redundancy  among  the  channels  ll43ll.  To  simplify  the  initialization  in  color  image  processing, 
the  E-step  in  the  first  iteration  is  calculated  with  “gray-level”  patches  of  size  \/N  x  \/N  on  each  channel,  but 
with  a  unified  model  selection  across  the  channels:  The  same  model  selection  is  performed  throughout  the 
channels  by  minimizing  the  sum  of  the  model  selection  energy  ©  over  all  the  channels;  the  signal  estimation 
is  calculated  in  each  channel  separately.  The  M-step  then  estimates  the  Gaussian  models  with  the  “color” 
patches  of  size  \/N  x  \/N  x  3  based  on  the  model  selection  and  the  signal  estimate  previously  obtained  in 
the  E-step.  Starting  from  the  second  iteration,  both  the  E-  and  M-steps  are  calculated  with  “color”  patches, 
treating  the  \/N  x  \/N  x  3  patches  as  vectors  of  size  3N.  \/N  is  set  to  6  for  color  images,  as  in  the  previous 
works  ll43ll.  iTTTIl.  The  MAP-EM  algorithm  runs  for  5  iterations.  The  noise  standard  deviation  a  is  set  to 
3,  which  corresponds  to  the  typical  noise  level  in  these  images.  The  small  constant  E  in  the  covariance 
regularization  is  set  to  30  in  all  the  experiments. 

The  PLE  inpainting  is  compared  with  a  number  of  recent  methods,  including  “MCA”  (morphological 
component  analysis)  1(2^.  “ASR”  (adaptive  sparse  reconstructions)  ll33]l  ,  “ECM”  (expectation  conditional 
maximization)  |[27ll  ,  “KR”  (kernel  regression)  llbOll.  “EOE”  (fields  of  experts)  ll56ll.  “BP”  (beta  process)  iTTTIl. 
and  “K-SVD”  ll43ll.  MCA  and  ECM  compute  the  sparse  inverse  problem  estimate  in  a  dictionary  that 
combines  a  curvelet  frame  GUI,  a  wavelet  frame  B31l  and  a  local  DCT  basis.  ASR  calculates  the  sparse 
estimate  with  a  local  DCT.  BP  infers  a  nonparametric  Bayesian  model  from  the  image  under  test  (noise 
level  is  automatically  estimated).  Using  a  natural  image  training  set,  POE  and  K-SVD  learn  respectively  a 
Markov  random  field  model  and  an  overcomplete  dictionary  that  gives  sparse  representation  for  the  images. 
The  results  of  MCA,  ECM,  KR,  POE  are  generated  by  the  original  authors’  softwares,  with  the  parameters 
manually  optimized,  and  those  of  ASR  are  calculated  with  our  own  implementation.  The  PSNRs  of  BP 
and  K-SVD  are  cited  from  the  corresponding  papers.  K-SVD  and  BP  currently  generate  the  best  inpainting 
results  in  the  literature. 

Table  [D-left  gives  the  inpainting  results  on  gray-level  images.  PLE  considerably  outperforms  the  other 
methods  in  all  the  cases,  with  a  PSNR  improvement  of  about  2  dB  on  average  over  the  second  best  algorithms 
(BP,  POE  and  MCA).  With  20%  available  data  on  Barbara,  which  is  rich  in  textures,  it  gains  as  much  as 
about  3  dB  over  MCA,  4  dB  over  ECM  and  6  dB  over  all  the  other  methods.  Let  us  remark  that  when  the 
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missing  data  ratio  is  high,  MCA  generates  quite  good  results,  as  it  benefits  from  the  curvelet  atoms  that 
have  large  support  relatively  to  the  local  patches  used  by  the  other  methods. 

Figure  [T0|  compares  the  results  of  different  algorithms.  All  the  methods  lead  to  good  inpainting  results 
on  the  smooth  regions.  MCA  and  KR  are  good  at  capturing  contour  structures.  However,  when  the  curvelet 
atoms  are  not  correctly  selected,  MCA  produces  noticeable  elongated  curvelet-like  artifacts  that  degrade 
the  visual  quality  and  offset  its  gain  in  PSRN  (see  for  example  the  face  of  Barbara).  MCA  restores  better 
textures  than  BP,  ASR,  FOE  and  KR.  PLE  leads  to  accurate  restoration  on  both  the  directional  structures 
and  the  textures,  producing  the  best  visual  quality  with  the  highest  PSNRs.  An  additional  PLE  inpainting 
examples  is  shown  in  Eigure  U\ 
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41.14 

43.12 

50% 

34.93 

33.87 

33.49 

33.56 

34.33 

35.47 

37,03 

30% 

31.32 

29.90 

29.87 

30.11 

30.56 

30.99 

33,18 

20% 

29.04 

28.07 

27.68 

28.19 

28.53 

28.43 

31.21 

Data  ratio 

BP 

PLE 

Castle 

80% 

41.51 

48.26 

50% 

36.45 

38,34 

30% 

32.02 

33,01 

20% 

29.12 

30,07 

Mushroom 

80% 

42.56 

49.25 

50% 

38.88 

40.72 

30% 

34.63 

35.36 

20% 

31.56 

32,06 

Train 

80% 

40.73 

44.01 

50% 

32.00 

32.75 

30% 

27.00 

27.46 

20% 

24.59 

24.73 

Horses 

80% 

41.97 

48,83 

50% 

37.27 

38,52 

30% 

32.52 

32.99 

20% 

29.99 

30.26 

Average 

80% 

41.69 

47.59 

50% 

36.15 

37,58 

30% 

31.54 

32,18 

20% 

28.81 

29.28 

TABLE  I 

PSNR  COMPARISON  ON  GRAY-LEVEL  (LEET)  AND  COLOR  (RIGHT)  IMAGE  INPAINTING.  FOR  EACH  IMAGE, 
UNIFORM  RANDOM  MASKS  WITH  FOUR  AVAILABLE  DATA  RATIOS  ARE  TESTED.  THE  ALGORITHMS  UNDER 
CONSIDERATION  ARE  MCA  (Mil,  ASR  (Ml  ,  ECM  (Ml  ,  KR  (Ml,  FOE  (56l,  BP  (Ml,  and  the  proposed  PLE 
FRAMEWORK.  THE  BOTTOM  BOX  SHOWS  THE  AVERAGE  PSNRS  GIVEN  BY  EACH  METHOD  OVER  ALL  THE  IMAGES 
AT  EACH  AVAILABLE  DATA  RATIO.  THE  HIGHEST  PSNR  IN  EACH  ROW  IS  IN  BOLDFACE.  THE  ALGORITHMS  WITH  * 

USE  A  TRAINING  DATASET. 


Table  [D-right  compares  the  PSNRs  of  the  PLE  color  image  inpainting  results  with  those  of  BP  (the  only 
one  in  the  literature  that  reports  the  comprehensive  comparison  in  our  knowledge).  Again,  PLE  generates 
higher  PSNRs  in  all  the  cases.  While  the  gain  is  especially  large,  at  about  6  dB,  when  the  available  data 
ratio  is  high  (at  80%),  for  the  other  masking  rates,  it  is  mostly  between  0.5  and  1  dB.  Both  methods  use 
only  the  image  under  test  to  learn  the  dictionaries. 

Eigure  \TT\  illustrates  the  PLE  inpainting  result  on  Castle  with  20%  available  data.  Calculated  with  a  much 
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(a)  Original 


(e)  KR  (21.55  dB) 


(b)  Masked 


(f)  FOE  (21.92  dB) 


(c)  MCA  (24.18  dB) 


(g)  BP  (25.54  dB) 


(d)  ASR  (21.84  dB) 


(h)  PLE  (27.65  dB) 


Fig.  10.  Gray-level  image  inpainting,  (a)  Original  image  cropped  from  Barbara,  (b)  Masked  image  with  20%  available 
data  (6.81  dB).  From  (c)  to  (g):  Image  inpainted  by  different  algorithms.  Note  the  overall  superior  visual  quality 
obtained  with  the  proposed  approach.  The  PSNRs  are  calculated  on  the  cropped  images. 


reduced  computational  complexity,  the  resulting  30.07  dB  PSNR  surpasses  the  highest  PSNR,  29.65  dB, 
reported  in  the  literature,  produced  by  K-SVD  ll43l.  that  uses  a  dictionary  learned  from  a  natural  image 
training  set,  followed  by  29.12  dB  given  by  BP  (BP  has  been  recently  improved  adding  spatial  coherence 
in  the  code,  unpublished  results).  As  shown  in  the  zoomed  region,  PLE  accurately  restores  the  details  of  the 
castle  from  the  heavily  masked  image.  Let  us  remark  that  inpainting  with  random  masks  on  color  images  is 
in  general  more  favorable  than  on  gray-level  images,  thanks  to  the  information  redundancy  among  the  color 
channels. 


(a)  Original  (b)  Masked  (c)  PLE 

Fig.  11.  Color  image  inpainting,  (a)  Original  image  cropped  from  Castle,  (b)  Masked  image  with  20%  available  data 
(5.44  dB).  (c)  Image  inpainted  by  PLE  (27.30  dB).  The  PSNR  on  the  overall  image  obtained  with  PLE  is  30.07  dB, 
higher  than  the  best  result  reported  so  far  in  the  literature  29.65  dB  14^. 


VI.  Interpolation  zooming 


Interpolation  zooming  is  a  special  case  of  inpainting  with  regular  subsampling  on  uniform  grids.  As 
explained  in  Section  IIII-Ai  the  regular  subsampling  operator  U  may  result  in  a  highly  coherent  transformed 


June  15,  2010 


DRAFT 


21 


dictionary  UD.  Calculating  an  accurate  sparse  estimation  for  interpolation  zooming  is  therefore  more  difficult 
than  that  for  inpainting  with  random  masks. 

The  experiments  are  performed  on  the  gray-level  images  Lena,  Peppers,  Mandril,  Cameraman,  Boat,  and 
Straws,  and  the  color  images  Lena,  Peppers,  KodakOS  and  Kokad20.  The  color  images  are  treated  in  the 
same  way  as  for  inpainting.  These  high-resolution  images  are  down-sampled  by  a  factor  2x2  without 
anti-aliasing  filtering.  The  resulting  low-resolution  images  are  aliased,  which  corresponds  to  the  reality  of 
television  images  that  are  usually  aliased,  since  this  improves  their  visual  perception.  The  low-resolution 
images  are  then  zoomed  by  the  algorithms  under  consideration.  When  the  anti-aliasing  blurring  operator  is 
included  before  subsampling,  zooming  can  be  casted  as  a  deconvolution  problem  and  will  be  addressed  in 
Section  rvlll 

The  PLE  interpolation  zooming  is  compared  with  linear  interpolators  lISTl.  l[63]l.  l[52ll  as  well  as  recent 
super-resolution  algorithms  “NEDI”  (new  edge  directed  interpolation)  |[39||,  “DEDE”  (directional  filtering 
and  data  fusion)  |[68l.  “KR”  (kernel  regression)  ||60||,  “ECM”  (expectation  conditional  maximization)  |[27ll. 
“Contourlet”  |l5ll|,  “ASR”  (adaptive  sparse  reconstructions)  l[3^.  “EOE”  (fields  of  experts)  l[56]l.  “SR” 
(sparse  representation)  ||65l,  “SAI”  (soft-decision  adaptive  Interpolation)  1^  and  “SME”  (sparse  mixing 
estimators)  ll47l.  KR,  ECM,  ASR  and  EOE  are  generic  inpainting  algorithms  that  have  been  described  in 
Section  |Vl  NEDI,  DEDE  and  SAI  are  adaptive  directional  interpolation  methods  that  take  advantage  of  the 
image  directional  regularity.  Contourlet  is  a  sparse  inverse  problem  estimator  as  described  in  Section  IIII-Al 
computed  in  a  contourlet  frame.  SR  is  also  a  sparse  inverse  estimator  that  learns  the  dictionaries  from 
a  training  image  set.  SME  is  a  recent  zooming  algorithm  that  exploits  directional  structured  sparsity  in 
wavelet  representations.  Among  the  previously  published  algorithms,  SAI  and  SME  currently  provide  the 
best  PSNR  for  spatial  image  interpolation  zooming  HTll.  |[6^.  The  results  of  ASR  are  generated  with  our  own 
implementation,  and  those  of  all  the  other  algorithms  are  produced  by  the  original  authors’  softwares,  with 
the  parameters  manually  optimized.  As  the  anti-aliasing  operator  is  not  included  in  the  interpolation  zooming 
model,  to  obtain  correct  results  with  SR,  the  anti-aliasing  filter  used  in  the  original  authors’  SR  software  is 
deactivated  in  both  dictionary  training  (with  the  authors’  original  training  dataset  of  92  images)  and  super¬ 
resolution  estimation.  PLE  is  configured  in  the  same  way  as  for  inpainting  as  described  in  Section  |Vl  with 
patch  size  8  x  8  for  gray-level  images,  and  6  x  6  x  3  for  color  images. 

Table  [II|  gives  the  PSNRs  generated  by  all  algorithms  on  the  gray-level  and  the  color  images.  Bicubic 
interpolation  provides  nearly  the  best  results  among  all  tested  linear  interpolators,  including  cubic  splines  1^. 
MOMS  lO  and  others  ll5^.  due  to  the  aliasing  produced  by  the  down-sampling.  PLE  gives  moderately  higher 
PSNRs  than  SME  and  SAI  for  all  the  images,  with  one  exception  where  the  SAI  produces  slightly  higher 
PSNR.  Their  gain  in  PSNR  is  significantly  larger  than  with  all  the  other  algorithms. 
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Figure  [l2]  compares  an  interpolated  image  obtained  by  the  baseline  bicubic  interpolation  and  the  algorithms 
that  generate  the  highest  PSNRs,  SAI  and  PLE.  The  local  PSNRs  on  the  cropped  images  produced  by  all 
the  methods  under  consideration  are  reported  as  well.  Bicubic  interpolation  produces  some  blur  and  jaggy 
artifacts  in  the  zoomed  images.  These  artifacts  are  reduced  to  some  extent  by  the  NEDI,  DEDE,  KR  and  EOE 
algorithms,  but  the  image  quality  is  still  lower  than  with  PLE,  SAI  and  SME  algorithms,  as  also  reflected  in 
the  PSNRs.  SR  yields  an  image  that  looks  sharp.  However,  due  to  the  coherence  of  the  transformed  dictionary, 
as  explained  in  Section  IIII-Al  when  the  approximating  atoms  are  not  correctly  selected,  it  produces  artifact 
patterns  along  the  contours,  which  degrade  its  PSNR.  The  PLE  algorithm  restores  slightly  better  than  SAI 
and  SME  on  regular  geometrical  structures,  as  can  be  observed  on  the  upper  and  lower  propellers,  as  well 
as  on  the  fine  lines  on  the  side  of  the  plane  indicated  by  the  arrows. 


Bicubic 

NEDI 

DFDF 

KR 

ECM 

Contourlet 

ASR 

FOE* 

SAI 

SME 

PLE 

Lena 

33.93 

33.77 

33.91 

33.94 

24.31 

33.92 

33.19 

34.04 

34.68 

34.58 

34,76 

Peppers 

32.83 

33.00 

33.18 

33.15 

23.60 

33.10 

32.33 

31.90 

33.52 

33.52 

33.62 

Mandril 

22.92 

23.16 

22.83 

22.93 

20.34 

22.53 

22.66 

22.99 

23.19 

23.16 

23.27 

Cameraman 

25.37 

25.42 

25.67 

25.51 

19.50 

25.35 

25.33 

25.58 

25.88 

26.26 

26.47 

Boat 

29.24 

29.19 

29.32 

29.18 

22.20 

29.25 

28.96 

29.36 

29.68 

29.76 

29.93 

Straws 

20.53 

20.54 

20.70 

20.76 

17.09 

20.52 

20.54 

20.47 

21.48 

21.61 

21.82 

Ave.  gain 

0 

0.04 

0.13 

0.11 

-6.30 

-0.02 

-0.30 

-0.08 

0.60 

0.68 

0.84 

Bicubic 

NEDI 

DFDF 

KR 

FOE* 

SR* 

SAI 

SME 

PLE 

Lena 

32.41 

32.47 

32.46 

32.55 

32.55 

26.42 

32.98 

32.88 

33.53 

Peppers 

30.95 

31.06 

31.24 

31.26 

31.05 

26.43 

31.37 

31.35 

31.88 

Kodak05 

25.82 

25.93 

26.03 

26.09 

26.01 

20.76 

26.91 

26.72 

26.77 

Kodak20 

30.65 

31.06 

31.08 

30.97 

30.84 

25.92 

31.51 

31.38 

31.72 

Ave.  gain 

0 

0.17 

0.25 

0.27 

0.16 

-5.07 

0.74 

0.63 

1.02 

TABLE  II 

PSNR  COMPARISON  ON  GRAY-LEVEL  (TOP)  AND  COLOR  (BOTTOM)  IMAGE  INTERPOLATION  ZOOMING.  THE 
ALGORITHMS  UNDER  CONSIDERATION  ARE  BICUBIC  INTERPOLATION,  NEDI  EH,  DFDF  [IMl,  KR  [l60ll. 
ECM  Il27]|,  CONTOURLET  [[ST]],  ASR  (SSI,  FOE  |l56l,  SR  [[65]|,  SAI  |l69l  ,  SME  |l47]|  AND  the  proposed  PLE 
ERAMEWORK.  THE  BOTTOM  ROW  SHOWS  THE  AVERAGE  GAIN  OF  EACH  METHOD  RELATIVE  TO  THE  BICUBIC 
INTERPOLATION.  THE  HIGHEST  PSNR  IN  EACH  ROW  IS  IN  BOLDFACE.  THE  ALGORITHMS  WITH  *  USE  A  TRAINING 

DATASET. 


VIE  Deblurring 

An  image  f  is  blurred  and  contaminated  by  additive  noise,  y  =  Uf+ w,  where  U  is  a  convolution  operator 
and  w  is  the  noise.  Image  deblurring  aims  at  estimating  f  from  the  blurred  and  noisy  observation  y. 

A.  Hierarchical  PLE 

As  explained  in  Section  IIII-C21  the  recoverability  condition  of  sparse  super-resolution  estimates  for 
deblurring  requires  a  dictionary  comprising  atoms  with  spread  Fourier  spectrum  and  thus  localized  in  space, 
such  as  the  position  PCA  basis  illustrated  in  Figure  IH  To  reduce  the  computational  complexity,  model 
selection  with  a  hierarchy  of  directional  PCA  bases  and  position  PCA  bases  is  proposed,  in  the  same  spirit 
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(a)  HR  (b)  LR  (c)  Bibubic  (d)  SAI  (e)  PLE 


Fig.  12.  Color  image  zooming,  (a)  Crop  from  the  high-resolution  image  Kodak20.  (b)  Low-resolution  image.  From  (c) 
to  (e),  images  zoomed  by  bicubic  interpolation  (28.48  dB),  SAI  (30.32  dB)  |[69l.  and  proposed  PLE  framework  (30.64 
dB).  PSNRs  obtained  by  the  other  methods  under  consideration:  NEDI  (29.68  dB)  ll39ll,  DEDE  (29.41  dB)  (681,  KR 
(29.49  dB)  (601,  FOE  (28.73  dB)  (56l,  SR  (23.85  dB)  (63,  and  SME  (29.90  dB)  (47l.  Attention  should  be  focused 
on  the  places  indicated  by  the  arrows. 

of  ||66l-  Figure  [T3l-(a)  illustrates  the  hierarchical  PLE  with  a  cascade  of  the  two  layers  of  model  selections. 
The  first  layer  selects  the  direction,  and  given  the  direction,  the  second  layer  further  specifies  the  position. 

In  the  first  layer,  the  model  selection  procedure  is  identical  to  that  in  image  inpainting  and  zooming,  i.e.,  it 
is  calculated  with  the  Gaussian  models  corresponding  to  the  directional  PCA  bases  {Bk}i<k<K,  Figure [2l-(c). 
In  this  layer,  a  directional  PCA  of  orientation  6  is  selected  for  each  patch.  Given  the  directional  basis 
selected  in  the  first  layer,  the  second  layer  recalculates  the  model  selection  ([7]),  this  time  with  a  family  of 
position  PCA  bases  {Bk^p}i<p<p  corresponding  to  the  same  direction  6  as  the  directional  basis  selected 
in  the  first  layer,  with  atoms  in  each  basis  B^^p  localized  at  one  position,  and  the  P  bases  translating  in 
space  and  covering  the  whole  patch.  The  image  patch  estimation  ([8])  is  obtained  in  the  second  layer.  This 
hierarchical  calculation  reduces  the  computational  complexity  from  ff{KP)  to  ff{K^P). 


(a)  (b) 


Fig.  13.  (a).  Hierarchical  PLE  for  deblurring.  Each  patch  in  the  first  layer  symbolizes  a  directional  PCA  basis.  Each 

patch  in  the  second  layer  symbolizes  a  position  PCA  basis,  (b)  To  circumvent  boundary  issues,  deblurring  a  patch 
whose  support  is  Q.  can  be  casted  as  inverting  an  operator  compounded  by  a  masking  and  a  convolution  defined  on  a 
larger  support  See  text  for  details. 

For  deblurring,  boundary  issues  on  the  patches  need  to  be  addressed.  Since  the  convolution  operator  is  non¬ 
diagonal,  the  deconvolution  of  each  pixel  y(x)  in  the  blurred  image  y  involves  the  pixels  in  a  neighborhood 
around  x  whose  size  depends  on  the  blurring  kernel.  As  the  patch  based  methods  deal  with  the  local  patches, 
for  a  given  patch,  the  information  outside  of  it  is  missing.  Therefore,  it  is  impossible  to  obtain  accurate 
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deconvolution  estimation  on  the  boundaries  of  the  patches.  To  circumvent  this  boundary  problem,  a  larger 
patch  is  considered  and  the  deconvolution  is  casted  as  a  deconvolution  plus  an  inpainting  problem.  Let  us 
retake  the  notations  f/,  y/  and  W/  to  denote  respectively  the  patches  of  size  ^/N  x  ^/N  in  the  original  image 
f,  the  degraded  image  y,  and  the  noise  w.  Let  Q.  be  their  support.  Let  f/,  y/  and  W/  be  the  corresponding 
larger  patches  of  size  {\/N  +  2r)  x  +  2r),  whose  support  Cl  is  centered  at  the  same  position  as  Q.  and 
with  an  extended  boundary  Cl\Q.  of  width  r  (the  width  of  the  blurring  kernel,  see  below),  as  illustrated  in 
Figure  [T3l-(b).  Let  U  be  an  extension  of  the  convolution  operator  U  on  ^  such  that  Uf/(x)  =  Uf/(x)  if  x  G  tl, 
and  0  if  X  G  Cl\Q..  Let  M  be  a  masking  operator  defined  on  Cl  which  keeps  all  the  pixels  in  the  central  part 
Cl  and  kills  the  rest,  i.e.,  Mf/(x)  =  f/(x)  if  x  G  tl,  and  0  if  x  G  Cl\Cl.  If  the  width  r  of  the  boundary  Cl\Cl  is 
larger  than  the  radius  of  the  blurring  kernel,  one  can  show  that  the  blurring  operation  can  be  rewritten  locally 
as  an  extended  convolution  on  the  larger  support  followed  by  a  masking.  My/  =  MUf/  +  Mw/.  Estimating  f/ 
from  y/  can  thus  be  calculated  by  estimating  f/  from  My/,  following  exactly  the  same  algorithm,  now  treating 
the  compounded  MU  as  the  degradation  operator  to  be  inverted.  The  boundary  pixels  in  the  estimate  f/(x), 
X  G  Cl\Cl,  can  be  interpreted  as  an  extrapolation  from  y/,  therefore  less  reliable.  The  deblurring  estimate  f/ 
is  obtained  by  discarding  these  boundary  pixels  from  f/  (which  are  outside  of  Cl  anyway). 

Local  patch  based  deconvolution  algorithms  become  less  accurate  if  the  blurring  kernel  support  is  large 
relative  to  the  patch  size.  In  the  deconvolution  experiments  reported  below.  Cl  and  Cl  are  respectively  set  to 
8x8  and  12  x  12.  The  blurring  kernels  are  restricted  to  a  5  x  5  support. 

B.  Deblurring  Experiments 

The  deblurring  experiments  are  performed  on  the  gray-level  images  Lena,  Barbara,  Boat,  House,  and 
Cameraman,  with  different  amounts  of  blur  and  noise.  The  PLE  deblurring  is  compared  with  a  number 
of  deconvolution  algorithms:  “EorWaRD”  (Eourier-wavelet  regularized  deconvolution)  lf53ll.  “TVB”  (total 
variation  based)  (Til,  “TwIST”  (two-step  iterative  shrinkage/thresholding)  lO,  “SP”  (sparse  prior)  IMI,  “SA- 
DCT”  (shape  adaptive  DCT)  |[28l,  “BM3D”  (3D  transform-domain  collaborative  filtering)  lIT^.  and  “DSD” 
(direction  sparse  deconvolution)  BH.  EorWaRD,  SA-DCT  and  BM3D  first  calculate  the  deconvolution  with 
a  regularized  Wiener  filter  in  Eourier,  and  then  denoise  the  Wiener  estimate  with,  respectively,  a  thresholding 
estimator  in  wavelet  and  SA-DCT  representations,  and  with  the  non-local  3D  collaborative  filtering  ifTTll. 
TVB  and  TwIST  deconvolutions  regularize  the  estimate  with  the  image  total  variation  prior.  SP  assumes  a 
sparse  prior  on  the  image  gradient.  DSD  is  a  recently  developed  sparse  inverse  problem  estimator,  described 
in  Section  IIII-A[  In  the  previous  published  works,  BM3D  and  SA-DCT  are  among  the  deblurring  methods 
that  produce  the  highest  PSNRs,  followed  by  SP.  The  results  of  TVB,  TwIST,  SP,  SA-DCT  and  DSD  are 
generated  by  the  authors’  original  softwares,  with  the  parameters  manually  optimized,  and  those  of  EorWaRD 
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are  calculated  with  our  own  implementation.  The  proposed  algorithm  runs  for  5  iterations. 

Table  [Till  gives  the  ISNRs  (improvement  in  PSNR  relative  to  the  input  image)  of  the  different  algorithms 
for  restoring  images  blurred  with  Gaussian  kernels  of  standard  deviation  CJ^  =  1  and  2  (truncated  to  a  5  x  5 
support),  and  contaminated  by  a  white  Gaussian  noise  of  standard  deviation  a^  =  5.  BM3D  produces  the 
highest  ISNRs,  followed  closely  by  SA-DCT  and  PLE,  whose  ISNRs  are  comparable  and  are  moderately 
higher  than  with  SP  on  average.  Let  us  remark  that  BM3D  and  SA-DCT  apply  an  empirical  Wiener  filtering 
as  a  post-processing  that  boosts  the  ISNR  by  near  1  dB.  The  empirical  Wiener  technique  can  be  plugged 
into  other  sparse  transform-based  methods  such  as  PLE  and  LorWaRD  as  well.  Without  this  post-processing, 
PLE  outperforms  BM3D  and  SA-DCT  on  average. 

Ligure  [14]  shows  a  deblurring  example.  All  the  algorithms  under  consideration  reduce  the  amount  of 
blur  and  attenuate  the  noise.  BM3D  generates  the  highest  ISNR,  followed  by  SA-DCT,  PLE  and  SP,  all 
producing  similar  visual  quality,  which  are  moderately  better  than  the  other  methods.  DSD  accurately  restores 
sharp  image  structures  when  the  atoms  are  correctly  selected,  however,  some  artifacts  due  to  the  incorrect 
atom  selection  offset  its  gain  in  ISNR.  The  empirical  Wiener  filtering  post-processing  in  BM3D  and  SA- 
DCT  efficiently  removes  some  artifacts  and  significantly  improves  the  visual  quality  and  the  ISNR.  More 
examples  of  PLE  deblurring  will  be  shown  in  the  next  section. 


Kernel  size  and  input  PSNR 

ForWaRD 

TVB 

TwIST 

SA-DCT 

BM3D 

SP 

DSD* 

PLE 

Lena 

Ob  =  1 

30.62 

2.51 

3.03 

2.87 

3.56/2.58 

4,03/3.45 

3.31 

2.56 

3.77 

Ob  =  2 

28.84 

2.33 

3.15 

3.13 

3.46/3.00 

3,91/3.20 

3.40 

2.47 

3.52 

House 

Ob  =  l 

30.04 

2.31 

3.12 

3.23 

4.14/3.07 

4.29/3.80 

3.52 

2.27 

LJI 

(N 

II 

28.02 

2.29 

3.24 

3.82 

4.21/3.64 

4.73/4.11 

3.92 

2.97 

3.90 

Boat 

Ob  =  l 

28.29 

1.69 

2.45 

2.44 

2.93/2.21 

3.23/2.46 

2.70 

1.93 

2.72 

Ob  =  2 

26.21 

1.63 

2.67 

2.59 

3.71/2.65 

3,33/2.44 

2.60 

2.02 

2.48 

Average 

Ob  =  1 

29.65 

2.17 

2.87 

2.84 

3.54/2.62 

3,85/3.23 

3.17 

2.25 

5.62 

(N 

II 

27.69 

2.08 

3.02 

3.18 

3.79/3.09 

3.99/3.25 

3.30 

2.48 

3.3] 

TABLE  III 

ISNR  (IMPROVEMENT  IN  PSNR  WITH  RESPECT  TO  INPUT  IMAGE)  COMPARISON  ON  IMAGE  DEBLURRING.  IMAGES 
ARE  BLURRED  BY  A  GAUSSIAN  KERNEL  OF  STANDARD  DEVIATION  (7^  =  1  AND  2,  AND  ARE  THEN  CONTAMINATED 

BY  WHITE  Gaussian  noise  of  standard  deviation  =  5.  From  left  to  right:  ForWaRD  |[53l,  TVB  ||7||, 
TwIST  im,  SA-DCT  (with/without  empirical  Wiener  post-processing)  ||28]|,  BM3D  (with/without 
empirical  Wiener  post-processing)  ||T8]],  SP  ||l8l,  DSD  [jHl,  and  the  proposed  PLE  framework.  The 

BOTTOM  BOX  SHOWS  THE  AVERAGE  ISNRS  GIVEN  BY  EACH  METHOD  OVER  ALL  THE  IMAGES  WITH  DIFFERENT 
AMOUNTS  OF  BLUR.  THE  HIGHEST  ISNR  IN  EACH  ROW  IS  IN  BOLDFACE,  WHILE  THE  HIGHEST  WITHOUT 
POST-PROCESSING  IS  IN  ITALIC.  THE  ALGORITHMS  WITH  *  USE  A  TRAINING  DATASET. 


C.  Zooming  deblurring 

When  an  anti-aliasing  filtering  is  taken  into  account,  image  zooming-out  can  be  formulated  as  y  =  SUf, 
where  f  is  the  high-resolution  image,  U  and  S  are  respectively  an  anti-aliasing  convolution  and  a  subsampling 
operator,  and  y  is  the  resulting  low-resolution  image.  Image  zooming  aims  at  estimating  f  from  y,  which 
amounts  to  inverting  the  combination  of  the  two  operators  S  and  U. 
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(a)  Original  (b)  Blurred  and  noisy  (c)  BM3D  (D)  PLE 


Fig.  14.  Gray-level  image  deblurring,  (a)  Crop  from  Lena,  (b)  Image  blurred  by  a  Gaussian  kernel  of  standard  deviation 
Gij  =  \  and  contaminated  by  white  Gaussian  noise  of  standard  deviation  On  =  5  (PSNR=27.10).  (c)  and  (d).  Images 
deblurred  by  BM3D  with  empirical  Wiener  post-processing  (ISNR  3.40  dB  dB)  and  the  proposed  PLE  framework 
(ISNR  2.94  dB).  ISNR  produced  by  the  other  methods  under  consideration:  BM3D  without  empirical  Wiener  post¬ 
processing  (2.65  dB)  lUSI,  TVB  (2.72  dB)  dll,  TwIST  (2.61dB)  (SI,  SP  (2.93  dB)  (HI,  SA-DCT  with/without  empirical 
Wiener  post-processing  (2.95/2.10  dB)  lESll.  and  DSD  (1.95  dB)  (411. 

Image  zooming  can  be  calculated  differently  under  different  amounts  of  blur  introduced  by  U.  Let  us 
distinguish  between  three  cases:  (i)  If  the  anti-aliasing  filtering  U  removes  enough  high-frequencies  from  f 
so  that  y  =  SUf  is  free  of  aliasing,  then  the  subsampling  operator  S  can  be  perfectly  inverted  with  a  linear 
interpolation  denoted  as  I,  i.e.,  IS  =  Id  ||45||.  In  this  case,  zooming  can  can  be  calculated  as  a  deconvolution 
problem  on  ly  =  Uf,  where  one  seeks  to  invert  the  convolution  operator  U.  In  reality,  however,  camera  and 
television  images  contain,  always  a  certain  amount  of  aliasing,  since  it  improves  the  visual  perception,  i.e., 
the  anti-aliasing  filtering  U  does  not  eliminate  all  the  high-frequencies  from  f.  (ii)  When  U  removes  a  small 
amount  of  high-frequencies,  which  is  often  the  case  in  reality,  zooming  can  be  casted  as  an  interpolation 
problem  [3911,  EZll,  [511.  [60l,  [68l.  [69l,  where  one  seeks  to  invert  only  S,  as  addressed  in  Section  |Vll 
(iii)  When  U  removes  an  intermediate  amount  of  blur  from  f,  the  optimal  zooming  solution  is  inverting  SU 
together  as  a  compounded  operator,  as  investigated  in  [65]l- 

This  section  introduces  a  possible  solution  for  the  case  (iii)  with  the  PLE  deblurring.  A  linear  interpolation 
I  is  first  applied  to  partially  invert  the  subsampling  operator  S.  Due  to  the  aliasing,  the  linear  interpolation 
does  not  perfectly  restore  Uf,  nevertheless  it  remains  rather  accurate,  i.e.,  the  interpolated  image  ly  =  ISUf 
is  close  to  the  blurred  image  Uf,  as  Uf  has  limited  high-frequencies  in  the  case  (iii).  The  PLE  deblurring 
framework  is  then  applied  to  deconvolve  U  from  ly.  Inverting  the  operator  U  is  simpler  than  inverting  the 
compounded  operator  SU.  As  the  linear  interpolation  I  in  the  first  step  is  accurate  enough  in  the  case  (iii), 
deconvolving  ly  results  in  accurate  zooming  estimates. 

In  the  experiments  below,  the  anti-aliasing  filter  U  is  set  as  a  Gaussian  convolution  of  standard  deviation 
cJg  =  1.0  and  Sisan^x^  =  2x2  subsampling  operator.  It  has  been  shown  that  a  pre-filtering  with  a 
Gaussian  kernel  of  cjg  =  0.8^  guarantees  that  the  following  ^  x  ^  subsampling  generates  a  quasi  aliasing-free 
image  [50ll.  Eor  a  2  x  2  subsampling,  the  anti-aliasing  filtering  U  with  cJg  =  1.0  leads  to  an  amount  of 
aliasing  and  visual  quality  similar  to  that  in  common  camera  pictures  in  reality. 

Eigure  [15]  illustrates  an  experiment  on  the  image  Lena.  Eigures  [l5|-(a)  to  (c)  show,  respectively,  a  crop  of 
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(a)  f  (b)  Uf  (c)  y  =  SUf  (d)  ly  (e)  PLE  (f)  SR 


Fig.  15.  Color  image  zooming  deblurring,  (a)  Crop  from  Lena:  f.  (b)  Image  pre-filtered  with  a  Gaussian  kernel  of 
standard  deviation  ctg  =  1.0:  Uf.  (c)  Image  subsampled  from  Uf  by  a  factor  of  2  x  2:  y  =  SUf.  (d)  Image  interpolated 
from  y  with  a  cubic  spline  interpolation:  ly  (31.03  dB).  (d)  Image  deblurred  from  ly  by  the  proposed  PLE  framework 
(34.27  dB).  (e)  Image  zoomed  from  y  with  1651  (29.66  dB).  The  PSNRs  are  calculated  on  the  cropped  image  between 
the  original  f  and  the  one  under  evaluation. 


the  original  image  f,  the  pre-filtered  version  Uf,  and  the  low-resolution  image  after  subsampling  y  =  SUf. 
As  the  amount  of  aliasing  is  limited  in  y  thanks  to  the  anti-aliasing  filtering,  a  cubic  spline  interpolation  is 
more  accurate  than  lower  ordered  interpolations  such  as  bicubic  Il63l.  and  is  therefore  applied  to  upsample 
y,  the  resulting  image  ly  illustrated  in  Figure  [l5l-(d).  A  visual  inspection  confirms  that  ly  is  very  close  to 
Uf,  the  PSNR  between  them  being  as  high  as  50.02  dB.  The  PLE  deblurring  is  then  applied  to  calculate  the 
final  zooming  estimate  f  by  deconvolving  U  from  ly.  (As  no  noise  is  added  after  the  anti-aliasing  filter,  the 
noise  standard  deviation  is  set  to  a  small  value  a  =  1.)  As  illustrated  in  Figure  [T5]-(e),  the  resulting  image 
f  is  much  sharper,  without  noticeable  artifacts,  and  improves  by  3.12  dB  with  respect  to  ly.  Figure  [T5l-(F) 
shows  the  result  obtained  with  “SR”  (sparse  representation)  ll^.  SR  implements  a  sparse  inverse  problem 
estimator  that  tries  to  invert  the  compounded  operator  SU,  with  a  dictionary  learned  from  a  natural  image 
dataset.  The  experiments  were  performed  with  the  authors’  original  software  and  training  image  set.  The 
dictionaries  were  retrained  with  the  US  described  above.  It  can  be  observed  that  the  resulting  image  looks 
sharper  and  the  restoration  is  accurate  when  the  atoms  selection  is  correct.  However,  due  to  the  coherence  of 
the  dictionaries  as  explained  in  Section  IIII-Ai  some  noticeable  artifacts  along  the  edges  are  produced  when 
the  atoms  are  incorrectly  selected,  which  also  offset  its  gain  in  PSNR. 

Figure  [T6|  shows  another  set  of  experiments  on  the  image  Girl.  Again  PLE  efficiently  reduces  the  blur 
from  the  interpolated  image  and  leads  to  a  sharp  zoomed  image  without  noticeable  artifacts.  SR  produces 
similarly  good  visual  quality  as  PLE,  however,  some  slight  but  noticeable  artifacts  (near  the  end  of  the  nose 
for  example)  due  to  the  incorrect  atom  selection  offset  its  gain  in  PSNR. 

Table  [IV|  gives  the  PSNRs  comparison  on  the  color  image  images  Lena,  Girl  and  Flower.  PLE  deblurring 
from  the  cubic  spline  interpolation  improves  from  1  to  2  dB  PSNR  over  the  interpolated  images.  Although 
SR  is  able  to  restore  sharp  images,  its  gain  in  PSNR  is  offset  by  the  noticeable  artifacts. 
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(a)  HR  (b)  LR  (c)  Cubic  spline  (c)  PLE  (d)  SR 


Fig.  16.  Color  image  zooming  deblurring,  (a)  Crop  from  Girl:  f.  (b)  Image  pre-filtered  with  a  Gaussian  kernel  of 
standard  deviation  Gg  =  1.0,  and  subsampled  by  a  factor  of  2  x  2:  y  =  SUf.  (c)  Image  interpolated  from  y  with  a  cubic 
spline  interpolation:  ly  (29.40  dB).  (d)  Image  deblurred  from  ly  by  the  proposed  PLE  framework  (30.49  dB).  (e)  Image 
zoomed  from  y  with  (63  (28.93  dB). 


Cubic  spline 

SR* 

PLE 

Lena 

31.60 

30.64 

33.78 

Girl 

30.62 

30.43 

31.82 

Flower 

37.02 

35.96 

39.06 

TABLE  IV 

PSNR  COMPARISON  IN  IMAGE  ZOOMING.  THE  HIGH-RESOLUTION  IMAGES  ARE  BLURRED  AND  SUBSAMPLED  TO 
GENERATE  THE  LOW-RESOLUTION  IMAGES.  THE  EIRST  COLUMN  SHOWS  THE  PSNR  PRODUCED  BY  CUBIC  SPLINE 
INTERPOLATION.  PLE  DEBLURRING  IS  APPLIED  OVER  THE  INTERPOLATED  IMAGES  AND  THE  RESULTING  PSNRS 

ARE  SHOWN  IN  THE  THIRD  COLUMN.  THE  SECOND  COLUMN  GIVES  THE  PSNR  OBTAINED  WITH  SR  ^6^.  THE 
HIGHEST  PSNR  IN  EACH  ROW  IS  IN  BOLDFACE.  THE  ALGORITHMS  WITH  *  USE  A  TRAINING  DATASET. 

VIIL  Conclusion  and  future  works 

This  work  has  shown  that  Gaussian  mixture  models  (GMM)  and  a  MAP-EM  algorithm  provide  general 
and  computational  efficient  solutions  for  inverse  problems,  leading  to  results  in  the  same  ballpark  as  state- 
of-the-art  ones  in  various  image  inverse  problems.  A  dual  mathematical  interpretation  of  the  framework  with 
Structured  sparse  estimation  is  described,  which  shows  that  the  resulting  piecewise  linear  estimate  stabilizes 
and  improves  the  traditional  sparse  inverse  problem  approach.  This  connection  also  suggests  an  effective 
dictionary  motivated  initialization  for  the  MAP-EM  algorithm.  In  a  number  of  image  restoration  applications, 
including  inpainting,  zooming,  and  deblurring,  the  same  simple  and  computationally  efficient  algorithm  (its 
core,  dH),  ([5]),  ([7])  and  ([8]),  can  be  written  in  4  lines  Matlab  code)  produce  either  equal,  often  significantly 
better,  or  very  small  margin  worse  results  than  the  best  published  ones,  with  a  computational  complexity 
typically  one  or  two  magnitude  smaller  than  /i  sparse  estimations.  Similar  results  (on  average  just  0.1  dB 
lower  than  BM3D  ifTTll)  are  obtained  for  the  simpler  problem  of  denoising  (U  the  identity  matrix). 

As  described  in  Section  [III  the  proposed  algorithm  is  calculated  with  classic  statistical  tools  of  MAP-EM 
clustering  and  empirical  covariance  estimation.  As  a  possible  future  work,  its  performance  may  be  further 
improved  with  more  sophisticated  statistical  instruments,  for  example,  the  stochastic  EM  algorithms  IfTTll 
and  more  advanced  covariance  regularization  methods  ifSTl.  at  a  cost  of  higher  computational  complexity. 
Acknowledgements:  Work  supported  by  NSF,  NGA,  ONR,  ARO  and  NSSEFF.  We  thank  Stephanie  Allassonniere  for 
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helpful  discussions,  in  particular  about  MAP-EM  and  covariance  regularization. 
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